Efficient Keyword Search on Large Rdf Data Using Optimization Technique

نویسندگان

  • Leya Zacharias
  • Neema George
چکیده

Now a day’s keyword search in data mining is very emerging topic. Latest keyword search techniques on Semantic Web are moving away from shallow, information retrieval-style approaches that merely find ―keyword matches‖ towards more interpretive approaches that attempt to induce structure from keyword queries. Exploiting identity links among RDF resources allows applications to efficiently integrate data. Keys can be very useful to discover these identity links. A set of properties is considered as a key when its values uniquely identify resources. However, these keys are usually not available. The approaches that attempt to automatically discover keys can easily be overwhelmed by the size of the data and require clean data. By using summarization algorithm the RDF data can be summarized. Here searching is done with optimization technique, so the result is accurate and efficient and also time complexity can be reduced. Unlike other techniques, our search algorithms always return correct results. For making the searching more efficient and get the accurate result within the time bound genetic algorithm is used. Keywords— Keyword search, RDF data, Genetic algorithm. INTRODUCTION Data mining becomes an essential research area in the field of computer science, as it helps much more in keyword searching .In recent day’s more and more data is provided in RDF format. Storing large amounts of RDF data and efficiently processing queries on these types of data is becoming very crucial. .But it is not always necessary that they will provide the appropriate result that they are actually looking for. Now a day’s more and more data is provided in RDF format, storing large amounts of RDF data and efficiently processing queries on such data is becoming crucial. .but it is not always necessary that they get the appropriate result that they searching for. The amount of data published on the Semantic Web has grown at increasing rates in the recent years. This is mainly happened due to the activities of the Linked Data community and the adoption of RDF by major web publishers. The number of data to be managed is stretching the scalability limitations of triple stores that are conventionally used to manage Semantic Web data. In the same time, the Semantic Web is increasingly reaching end users who need efficient and effective access to large subsets of this data. These type of end users prefer simple, but ambiguous natural language queries over highly selective, formal graph queries in SPARQL, the query language of triple stores. In a web search scenario, formulating SPARQL queries may not be feasible altogether due to the heterogeneity of data. The RDF (Resource Description Framework) is the de-facto standard for data representation on the Web. It is no surprise that we are inundated with large amounts of rapidly growing RDF data from disparate domains. For example, the Linked Open Data (LOD) initiative integrates billions of entities from hundreds of sources. Just one of these sources, the DBpedia dataset, explains more than 3:64 million things using more than 1 billion RDF triples; and it contains numerous keywords. The Resource Description Framework (RDF) may be understood as a common purpose, schema-flexible model for explaining meta data and graph-shaped information. RDF represents information in the form of statements (triples or quads). Every triple connotes an edge between two nodes in a graph. The quad position can be used to give statements identity or to place statements within a named graph. RDF gives some basic concepts used to model information statements are composed of a subject (a URI or a Blank Node), a predicate (always a URI), an object (a URI, Blank Node, or Literal value), and a context (a URI or a Blank Node). URIs are used to identity a particular resource, whereas Literal values describe constants such as character strings and may carry either a language code or data. International Journal of Engineering Research and General Science Volume 3, Issue 4, Part-2, July-August, 2015 ISSN 2091-2730 390 www.ijergs.org Keyword-based queries over semi-structured data are an important function of modern information management. When this information is available as Linked Data, it can be abstracted as a graph. Generally speaking, the end results of queries over this type of data are sub-structures of the graph that contain the keywords searched in the nodes. The task of indexing and finding the individual nodes containing those keywords is relatively inexpensive and has well-established tools available for it. Ascertaining the connections between those selected nodes is a decomposable problem. This involves expensive and timeconsuming graph explorations. More than that it must be solved on-the-fly at query processing time. The scale at which those interrogations must be solved grows as more and more data is made accessible to the Web of Data. Over and above that, to address the possibility of merging databases in the future, and to execute queries that include databases that had originally different schemes. Those queries must trust only on the most basic format of the Linked Data model, the subject-predicate-object format. First and most important objective is to evidence the related work done in keyword search. This is worth noting that it does not aim at surveying this field. As an alternative, it aims at giving the full picture of the evolution of keyword searching, initiated by the field of IR, and then adopted by the fields of web and databases. This is how each field has contributed to each other and in keyword searching severally. This way, the demands and trends of each research period can be identified and assessed, together with the research problems that each area has faced. The very next objective of the dissertation is to advance the state-of-the-art research in keyword search over RDF data. To this goal, the contributions of this dissertation lay on the design, implementation, and evaluation of a system supporting keyword searching over RDF data. The third and endmost objective is to shed light on the evaluation of systems and techniques targeting at keyword search over structured and semi structured data. Keyword search provide only an approximate description of the information items to be retrieved. Hence, the correctness of the retrieval cannot be formally verified, as it is the case with query languages, such as SQL. Alternatively, retrieval effectiveness is measured by user perception and experience. The efforts of this work are twofold: 1. Provide a mechanism to store sizable RDF graphs in a distributed way. we developed a mechanism that partition and summarize (shards) the RDF graph and persist data in separate, distributed data stores. 2. Provide a measurable keyword-based search mechanism for RDF graphs. We use optimization processing mechanism to provide a scalable result to the task of building keyword indexes for RDF datasets. The rest of the paper is organized as follows: in Section 1,we describe the related works. In Section 2, we present the search using genetic algorithm. In Section 3, reports extensive experimental results to support the proposed searching using genetic algorithm . Finally, in Section 4, we summarize the present study and draw some conclusions. I. RELATED WORK Conventional keyword search engines are limited to a specific information model and cannot easily get used to unstructured, semistructured or structured data. This paper projected an effective and merging keyword search method, called EASE[6], for indexing and querying large collections of heterogeneous data. To fulfill high efficiency in processing keyword queries, we first model unstructured, semi-structured and structured data as graphs. After that summarize the graphs and create graph indices as an alternative of using traditional inverted indices. This proposed an extensive inverted index to facilitate keyword-based search. After all this present a novel ranking mechanism for enhancing search effectiveness. It have conducted an extensive experimental study using real datasets. The final results show that EASE achieves both high search efficiency and high perfection, and outperforms the existing approaches drastically. Query dispensation over graph-structured information is enjoying a growing number of applications. A top-k keyword search query on a graph ands the top k answers according to some ranking criteria [2]. Here each answer is a substructure of the graph containing all query key-words. Present techniques for supporting such queries on general graphs suffer from several drawbacks. For example poor worst-case performance, not taking full benefit of indexes, and high memory requirements. To deal with these problems, it proposed BLINKS, a bi-level indexing and query processing scheme for top-k keyword search on graphs. BLINKS are follows a search strategy with provable performance bounds, while additionally exploiting a bi-level index for pruning and accelerating the search. To minimize the index space, BLINKS partitions a data graph into blocks: The bi level index stores summary information at the block level to initiate and guide search among blocks, and more detailed information for each block to accelerate search within blocks. International Journal of Engineering Research and General Science Volume 3, Issue 4, Part-2, July-August, 2015 ISSN 2091-2730 391 www.ijergs.org Applications in which bare text coexists with structured information are enveloping. Commercial relational database management systems (RDBMSs) generally provide querying capabilities for text attributes that incorporate state-of-the-art information retrieval (IR) relevance ranking strategies. But this search functionality requires that queries specify the exact column or columns against which a given list of keywords is to be matched [5]. This necessity can be cumbersome and unbendable from a user point of view. Perfect answers to a keyword query might require to be ―assembled‖ –in perhaps unpredicted ways– by joining tuples from multiple relations. This inspection has stimulated recent research on free-form keyword search over RDBMSs. This paper adapts IR-style documentrelevance ranking strategies to the problem of processing free-form keyword queries over RDBMSs. This query model can handle problems with both AND and OR semantics. The same exploits the sophisticated single-column text-search functionality often available in commercial RDBMSs. It develops query-processing measures that are constructed on a crucial characteristic of IR-style keyword search: only the few most relevant matches –according to some definition of ―relevance‖– are generally of interest. As a result, rather than computing all matches for a keyword query, which leads to inefficient executions, this techniques focus on the top-k matches for the query, for moderate values of k. With the mounting volume of text information stored in relational databases, there is a huge demand for RDBMS to support keyword queries over text data [4]. As a search result this often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly. In this paper we will study the effectiveness and the efficiency issues of answering topk keyword query in relational database systems. This will propose a latest ranking formula by adapting existing IR techniques based on a natural notion of virtual document. Compared with prior approaches, this new ranking method is very simple but effective. And all these agree with human perceptions. It studied effective query processing mechanism for the new ranking method, and propose algorithms that have minimal accesses to the database. It carried out extensive experiments on large-scale real data-bases using two popular RDBMSs. These results demonstrate strategic progress to the alternative approaches in terms of retrieval effectiveness and efficiency. Design a scalable and exact solution that handles practical RDF data sets with tens of millions of triples. To address the scalability issues, our solution builds a new, succinct and efficient summary from the underlying RDF graph based on its types. Given a keyword search query, we use the outline [1] to prune the search space, leading to much better efficiency compared to a baseline solution. To sum up, our contributions we identify and address limitations in the existing, state-of-the-art methods for keyword search in RDF data. This shows that these limitations could lead to incomplete and incorrect answers in real RDF data sets. But we propose a new, correct baseline solution based on the backward search idea. Develop efficient algorithms to summarize the structure of RDF data, based on the types in RDF graphs, and use it to speed up the search is more scalable and lends significant pruning power without sacrificing the soundness of the result. The conclusion is light-weight and updatable. Also by applying optimization algorithm to the search result we will gain most appropriate and efficient result. Optimization here is check with genetic algorithm

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

Scalable Keyword Search on Big RDF Data

Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely on constructing a distance matrix for pruning the search space or building summarization from the RDF graphs for query processing. In this work, we show that existing techniques have serious limitations in dealing with realistic, large RDF graphs with tens of millions of triples. Furthermore, the e...

متن کامل

Effective searching of RDF knowledge bases

RDF data has become a vital source of information for many applications. In this thesis, we present a set of models and algorithms to effectively search large RDF knowledge bases. These knowledge bases contain a large set of subjectpredicate-object (SPO) triples where subjects and objects are entities and predicates express relationships between them. Searching such knowledge bases can be done ...

متن کامل

RDF Keyword Search Using a Type-based Summary

Keyword search enjoys great popularity due to succinctness and easy operability for exploring RDF data. SPARQL has been recommended as the standard query language that can retrieve any answers users need from available RDF data. Thus, keyword search based on keywords-to-SPARQL attracts more and more attention. However, existing solutions have main limitations that the summary index used for tra...

متن کامل

Top-k Exploration of Query Graph Candidates for Efficient Keyword Search on RDF∗

Keyword queries enjoy widespread usage as they represent an intuitive way of specifying information needs. Recently, answering keyword queries on graph-structured data has emerged as an important research topic. The prevalent approaches build on dedicated indexing techniques as well as search algorithms aiming at finding substructures that connect the data elements matching the keywords. While ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015